fix: parallel_plan and parallel_apply#6427
fix: parallel_plan and parallel_apply#6427nvanheuverzwijn wants to merge 1 commit intorunatlantis:mainfrom
Conversation
Signed-off-by: Nicolas Vanheuverzwijn <nicolas.vanheu@gmail.com>
There was a problem hiding this comment.
Pull request overview
Fixes a regression where parallel_plan / parallel_apply effectively serialized execution by removing a workspace-level Git read lock that was held for the full duration of step execution.
Changes:
- Removed
WorkingDir.GitReadLock()acquisition fromDefaultProjectCommandRunner.runSteps()to allow multiple project steps to run concurrently within the same PR/workspace. - Relies on existing per-project
WorkingDirLockerfor directory-level mutual exclusion while enabling higher concurrency across projects.
| @@ -830,10 +830,6 @@ func (p *DefaultProjectCommandRunner) doStateRm(ctx command.ProjectContext) (out | |||
| func (p *DefaultProjectCommandRunner) runSteps(steps []valid.Step, ctx command.ProjectContext, absPath string) ([]string, error) { | |||
| var outputs []string | |||
|
|
|||
There was a problem hiding this comment.
Removing the WorkingDir.GitReadLock() around step execution means Clone()/MergeAgain() (which take a git write lock and can reset/merge the workspace) may now run concurrently with terraform steps in the same PR/workspace. That can mutate the working tree mid-plan/apply, leading to inconsistent results or intermittent failures under parallel execution.
Consider keeping a shared read lock while running steps, and instead addressing the parallelism bottleneck by changing WorkingDir.Clone/MergeAgain to avoid taking the write lock when they are effectively no-ops (e.g., acquire a read lock to check whether the workspace is already at the desired ref and only upgrade to a write lock when an update/merge is actually needed), or by cloning/merging once per PR/workspace before starting parallel project workers.
| unlock := ctx.WorkingDir.GitReadLock() | |
| defer unlock() |
There was a problem hiding this comment.
I beleive we already get a proper lock with
// Acquire internal lock for the directory we're going to operate in.
unlockFn, err := p.WorkingDirLocker.TryLock(ctx.Pull.BaseRepo.FullName, ctx.Pull.Num, ctx.Workspace, ctx.RepoRelDir, ctx.ProjectName, command.Plan)
if err != nil {
return nil, "", err
}
defer unlockFn()
in doPlan and doApply.
|
closed in favor of #6376 |
what
Fix the parallel plan and apply function.
why
Atlantis should run in parallel.
tests
I tested the change with my test project, which has more than 50 project in it. Without this change, it takes around 15minutes to run the entire thing. With my change, this is significantly reduce to approximately 3minutes, running 15 worker in parallel.
references
closes #6426